5.1 imbalanced dataset
5.1.1 Logistic Model
## [1] "Number of rows: 1147"
## [1] "Number of columns: 35"
## [1] "Number of missing values: 0"
## Confusion Matrix and Statistics
##
## Reference
## Prediction 0 1
## 0 204 36
## 1 2 1
##
## Accuracy : 0.8436
## 95% CI : (0.7917, 0.8869)
## No Information Rate : 0.8477
## P-Value [Acc > NIR] : 0.6129
##
## Kappa : 0.0278
##
## Mcnemar's Test P-Value : 8.636e-08
##
## Sensitivity : 0.99029
## Specificity : 0.02703
## Pos Pred Value : 0.85000
## Neg Pred Value : 0.33333
## Prevalence : 0.84774
## Detection Rate : 0.83951
## Detection Prevalence : 0.98765
## Balanced Accuracy : 0.50866
##
## 'Positive' Class : 0
##
5.1.1 Summary of Logistic model on imbalanced dataset
This logistic regression model performs well, has a nice accuracy
(around 84.36%), but performs poorly in terms of its ability to
distinguish between two categories (specifically, the positive category,
i.e., 1). The accuracy of the model is similar to the No Information
Rate (NIR), which indicates that the model does not significantly
outperform random guessing in terms of predictive power. In addition,
the model’s sensitivity in the positive category is extremely low and
the specificity is very bad.
Accuracy: The accuracy is 84.36%, but this is because the imbalance
in the data (most of the data belongs to category 0).
Positive Predictive Value and Negative Predictive Value: PPV is 85%,
indicating that when the model predicts a positive category, the
probability of being correct is 85%. BUT the NPV is very low which is
33.33%, which means the NP is very low. The model cannot distinguish the
class ‘1’.
Sensitivity and Specificity: Sensitivity is as high as 99%, but
Specificity is only 2.70%. This indicates that the model hardly
recognizes the positive class correctly (CLass=1).
Kappa statistic: The Kappa value is only 0.0278 which indicates that
the model has poor predictive power.
Mcnemar’s Test P-value: 8.636e-08, indicating that the model has
significant bias in predicting positive and negative classes.
Balanced Accuracy: 50.87%, which further emphasizes the inadequacy
of the model in handling unbalanced datasets.
5.1.2 Weighted Logistic model
## [1] "Number of rows: 1147"
## [1] "Number of columns: 35"
## [1] "Number of missing values: 0"
## Confusion Matrix and Statistics
##
## Reference
## Prediction 0 1
## 0 108 12
## 1 29 13
##
## Accuracy : 0.7469
## 95% CI : (0.6727, 0.8119)
## No Information Rate : 0.8457
## P-Value [Acc > NIR] : 0.99961
##
## Kappa : 0.2413
##
## Mcnemar's Test P-Value : 0.01246
##
## Sensitivity : 0.7883
## Specificity : 0.5200
## Pos Pred Value : 0.9000
## Neg Pred Value : 0.3095
## Prevalence : 0.8457
## Detection Rate : 0.6667
## Detection Prevalence : 0.7407
## Balanced Accuracy : 0.6542
##
## 'Positive' Class : 0
##
## [1] "AIC: 1238.59469568419"
5.1.2 Summary of Weighted Logistic model on imbalanced dataset
Accuracy: 74.69%, which is less than last one.
Predictive Ability: The model has a high prediction accuracy for the
positive category (90%) but performs poorly for the negative category
(NPV is only 30.95%), indicating that the model is less reliable in
predicting the negative category. So this result is the similar to last
one because of the imbalanced data.
Balanced Accuracy: 0.6542, indicating that the model is not very
well when it encounters imbalanced dataset. Especially encoutner the
negative class (class=1).
Mcnemar’s Test P-Value: 0.01246, indicating that the model has high
bias when it deal with different classes.
Kappa: 0.2413, indicating that this model is much better than last
one. Because this one get the weight into class ‘1’.
5.1.3 Decision Tree model
## Loading required package: tibble
## Loading required package: bitops
## Rattle: A free graphical interface for data science with R.
## Version 5.5.1 Copyright (c) 2006-2021 Togaware Pty Ltd.
## Type 'rattle()' to shake, rattle, and roll your data.
## [1] "Number of rows: 1147"
## [1] "Number of columns: 35"
## [1] "Number of missing values: 0"

## Call:
## rpart(formula = depressed ~ ., data = train_data, method = "class")
## n= 570
##
## CP nsplit rel error xerror xstd
## 1 0.03030303 0 1.0000000 1.000000 0.09802678
## 2 0.02272727 3 0.9090909 1.079545 0.10110870
## 3 0.01136364 5 0.8636364 1.181818 0.10478269
## 4 0.01000000 6 0.8522727 1.261364 0.10743548
##
## Variable importance
## hh_children years_of_edu med_sickdays_hhave
## 14 12 11
## cons_social fs_adwholed_often cons_other
## 10 10 8
## cons_nondurable durable_investment age
## 5 5 4
## cons_allfood nondurable_investment asset_durable
## 4 3 3
## fs_chwholed_often household_size asset_savings
## 3 2 1
## fs_sleephun fs_meat
## 1 1
##
## Node number 1: 570 observations, complexity param=0.03030303
## predicted class=0 expected loss=0.154386 P(node) =1
## class counts: 482 88
## probabilities: 0.846 0.154
## left son=2 (477 obs) right son=3 (93 obs)
## Primary splits:
## years_of_edu < 6.5 to the right, improve=4.107167, (0 missing)
## fs_adwholed_often < 2 to the left, improve=3.973162, (0 missing)
## children < 0.5 to the right, improve=2.285400, (0 missing)
## age < 66.5 to the left, improve=2.021110, (0 missing)
## asset_phone < 15.93529 to the right, improve=1.665369, (0 missing)
## Surrogate splits:
## age < 56.5 to the left, agree=0.870, adj=0.204, (0 split)
## household_size < 1.5 to the right, agree=0.840, adj=0.022, (0 split)
## cons_social < 24.08979 to the left, agree=0.840, adj=0.022, (0 split)
## children < 0.5 to the right, agree=0.839, adj=0.011, (0 split)
## cons_med_children < 12.8123 to the left, agree=0.839, adj=0.011, (0 split)
##
## Node number 2: 477 observations, complexity param=0.02272727
## predicted class=0 expected loss=0.1278826 P(node) =0.8368421
## class counts: 416 61
## probabilities: 0.872 0.128
## left son=4 (253 obs) right son=5 (224 obs)
## Primary splits:
## cons_social < 0.7273647 to the right, improve=1.8047670, (0 missing)
## durable_investment < 234.8594 to the right, improve=1.6072560, (0 missing)
## asset_phone < 63.26071 to the left, improve=1.3022130, (0 missing)
## ed_expenses < 45.6438 to the left, improve=0.8842878, (0 missing)
## fs_adwholed_often < 2 to the left, improve=0.8533597, (0 missing)
## Surrogate splits:
## cons_nondurable < 46.84868 to the right, agree=0.878, adj=0.741, (0 split)
## cons_other < 3.483343 to the right, agree=0.878, adj=0.741, (0 split)
## durable_investment < 36.43497 to the right, agree=0.878, adj=0.741, (0 split)
## nondurable_investment < 0.02780446 to the right, agree=0.878, adj=0.741, (0 split)
## asset_durable < 11.29084 to the right, agree=0.876, adj=0.737, (0 split)
##
## Node number 3: 93 observations, complexity param=0.03030303
## predicted class=0 expected loss=0.2903226 P(node) =0.1631579
## class counts: 66 27
## probabilities: 0.710 0.290
## left son=6 (69 obs) right son=7 (24 obs)
## Primary splits:
## fs_adwholed_often < 2 to the left, improve=4.087073, (0 missing)
## cons_social < 1.621556 to the left, improve=3.937965, (0 missing)
## nondurable_investment < 4.569942 to the left, improve=2.632348, (0 missing)
## cons_alcohol < 0.587257 to the right, improve=2.547581, (0 missing)
## ent_total_cost < 4.771246 to the left, improve=2.439231, (0 missing)
## Surrogate splits:
## fs_chwholed_often < 2 to the left, agree=0.785, adj=0.167, (0 split)
## fs_sleephun < 0.5 to the left, agree=0.774, adj=0.125, (0 split)
## fs_meat < 0.5 to the right, agree=0.763, adj=0.083, (0 split)
## cons_med_total < 1.040999 to the left, agree=0.753, adj=0.042, (0 split)
## cons_ed < 10.2098 to the left, agree=0.753, adj=0.042, (0 split)
##
## Node number 4: 253 observations
## predicted class=0 expected loss=0.08695652 P(node) =0.4438596
## class counts: 231 22
## probabilities: 0.913 0.087
##
## Node number 5: 224 observations, complexity param=0.02272727
## predicted class=0 expected loss=0.1741071 P(node) =0.3929825
## class counts: 185 39
## probabilities: 0.826 0.174
## left son=10 (214 obs) right son=11 (10 obs)
## Primary splits:
## hh_children < 4.5 to the left, improve=5.789736, (0 missing)
## fs_adwholed_often < 2 to the left, improve=4.149148, (0 missing)
## cons_ed < 1.287903 to the left, improve=3.827600, (0 missing)
## ed_expenses < 15.45483 to the left, improve=3.535107, (0 missing)
## cons_med_children < 0.4003842 to the right, improve=2.183618, (0 missing)
## Surrogate splits:
## asset_savings < 28.82767 to the left, agree=0.96, adj=0.1, (0 split)
## fs_chwholed_often < 2 to the left, agree=0.96, adj=0.1, (0 split)
## durable_investment < 843.3353 to the left, agree=0.96, adj=0.1, (0 split)
##
## Node number 6: 69 observations, complexity param=0.01136364
## predicted class=0 expected loss=0.2028986 P(node) =0.1210526
## class counts: 55 14
## probabilities: 0.797 0.203
## left son=12 (62 obs) right son=13 (7 obs)
## Primary splits:
## cons_other < 34.5932 to the left, improve=2.1160760, (0 missing)
## cons_alcohol < 0.587257 to the right, improve=0.9629084, (0 missing)
## children < 1.5 to the left, improve=0.8641367, (0 missing)
## years_of_edu < 5.5 to the right, improve=0.8101365, (0 missing)
## cons_nondurable < 241.4361 to the left, improve=0.7934950, (0 missing)
## Surrogate splits:
## cons_nondurable < 266.7672 to the left, agree=0.928, adj=0.286, (0 split)
## cons_social < 9.142107 to the left, agree=0.928, adj=0.286, (0 split)
## cons_allfood < 215.6241 to the left, agree=0.913, adj=0.143, (0 split)
##
## Node number 7: 24 observations, complexity param=0.03030303
## predicted class=1 expected loss=0.4583333 P(node) =0.04210526
## class counts: 11 13
## probabilities: 0.458 0.542
## left son=14 (14 obs) right son=15 (10 obs)
## Primary splits:
## med_sickdays_hhave < 1.525 to the left, improve=4.402381, (0 missing)
## cons_social < 1.354633 to the left, improve=2.938889, (0 missing)
## ed_schoolattend < 0.8571429 to the right, improve=2.937646, (0 missing)
## years_of_edu < 4.5 to the left, improve=2.288095, (0 missing)
## ent_total_cost < 4.771246 to the left, improve=1.399184, (0 missing)
## Surrogate splits:
## cons_social < 3.203074 to the left, agree=0.750, adj=0.4, (0 split)
## cons_allfood < 32.61111 to the right, agree=0.708, adj=0.3, (0 split)
## age < 66 to the left, agree=0.667, adj=0.2, (0 split)
## household_size < 1.5 to the right, agree=0.667, adj=0.2, (0 split)
## years_of_edu < 3.5 to the left, agree=0.667, adj=0.2, (0 split)
##
## Node number 10: 214 observations
## predicted class=0 expected loss=0.1495327 P(node) =0.3754386
## class counts: 182 32
## probabilities: 0.850 0.150
##
## Node number 11: 10 observations
## predicted class=1 expected loss=0.3 P(node) =0.01754386
## class counts: 3 7
## probabilities: 0.300 0.700
##
## Node number 12: 62 observations
## predicted class=0 expected loss=0.1612903 P(node) =0.1087719
## class counts: 52 10
## probabilities: 0.839 0.161
##
## Node number 13: 7 observations
## predicted class=1 expected loss=0.4285714 P(node) =0.0122807
## class counts: 3 4
## probabilities: 0.429 0.571
##
## Node number 14: 14 observations
## predicted class=0 expected loss=0.2857143 P(node) =0.0245614
## class counts: 10 4
## probabilities: 0.714 0.286
##
## Node number 15: 10 observations
## predicted class=1 expected loss=0.1 P(node) =0.01754386
## class counts: 1 9
## probabilities: 0.100 0.900
##
## n= 570
##
## node), split, n, loss, yval, (yprob)
## * denotes terminal node
##
## 1) root 570 88 0 (0.84561404 0.15438596)
## 2) years_of_edu>=6.5 477 61 0 (0.87211740 0.12788260)
## 4) cons_social>=0.7273647 253 22 0 (0.91304348 0.08695652) *
## 5) cons_social< 0.7273647 224 39 0 (0.82589286 0.17410714)
## 10) hh_children< 4.5 214 32 0 (0.85046729 0.14953271) *
## 11) hh_children>=4.5 10 3 1 (0.30000000 0.70000000) *
## 3) years_of_edu< 6.5 93 27 0 (0.70967742 0.29032258)
## 6) fs_adwholed_often< 2 69 14 0 (0.79710145 0.20289855)
## 12) cons_other< 34.5932 62 10 0 (0.83870968 0.16129032) *
## 13) cons_other>=34.5932 7 3 1 (0.42857143 0.57142857) *
## 7) fs_adwholed_often>=2 24 11 1 (0.45833333 0.54166667)
## 14) med_sickdays_hhave< 1.525 14 4 0 (0.71428571 0.28571429) *
## 15) med_sickdays_hhave>=1.525 10 1 1 (0.10000000 0.90000000) *
## Confusion Matrix and Statistics
##
## Reference
## Prediction 0 1
## 0 199 36
## 1 7 1
##
## Accuracy : 0.823
## 95% CI : (0.7691, 0.8689)
## No Information Rate : 0.8477
## P-Value [Acc > NIR] : 0.8759
##
## Kappa : -0.0102
##
## Mcnemar's Test P-Value : 1.955e-05
##
## Sensitivity : 0.96602
## Specificity : 0.02703
## Pos Pred Value : 0.84681
## Neg Pred Value : 0.12500
## Prevalence : 0.84774
## Detection Rate : 0.81893
## Detection Prevalence : 0.96708
## Balanced Accuracy : 0.49652
##
## 'Positive' Class : 0
##
## $model
## n= 570
##
## node), split, n, loss, yval, (yprob)
## * denotes terminal node
##
## 1) root 570 88 0 (0.84561404 0.15438596)
## 2) years_of_edu>=6.5 477 61 0 (0.87211740 0.12788260)
## 4) cons_social>=0.7273647 253 22 0 (0.91304348 0.08695652) *
## 5) cons_social< 0.7273647 224 39 0 (0.82589286 0.17410714)
## 10) hh_children< 4.5 214 32 0 (0.85046729 0.14953271) *
## 11) hh_children>=4.5 10 3 1 (0.30000000 0.70000000) *
## 3) years_of_edu< 6.5 93 27 0 (0.70967742 0.29032258)
## 6) fs_adwholed_often< 2 69 14 0 (0.79710145 0.20289855)
## 12) cons_other< 34.5932 62 10 0 (0.83870968 0.16129032) *
## 13) cons_other>=34.5932 7 3 1 (0.42857143 0.57142857) *
## 7) fs_adwholed_often>=2 24 11 1 (0.45833333 0.54166667)
## 14) med_sickdays_hhave< 1.525 14 4 0 (0.71428571 0.28571429) *
## 15) med_sickdays_hhave>=1.525 10 1 1 (0.10000000 0.90000000) *
##
## $confusion_matrix
## Confusion Matrix and Statistics
##
## Reference
## Prediction 0 1
## 0 199 36
## 1 7 1
##
## Accuracy : 0.823
## 95% CI : (0.7691, 0.8689)
## No Information Rate : 0.8477
## P-Value [Acc > NIR] : 0.8759
##
## Kappa : -0.0102
##
## Mcnemar's Test P-Value : 1.955e-05
##
## Sensitivity : 0.96602
## Specificity : 0.02703
## Pos Pred Value : 0.84681
## Neg Pred Value : 0.12500
## Prevalence : 0.84774
## Detection Rate : 0.81893
## Detection Prevalence : 0.96708
## Balanced Accuracy : 0.49652
##
## 'Positive' Class : 0
##
5.1.3 Summary of Decision Tree model on imbalanced dataset
Based on the decision tree model and confusion matrix data, the
model mainly predicted category 0 (non-depressed), but performed poorly
for category 1 (depressed).(the similar result as above some
models).
Accuracy: 82.3%, the model is highly accurate
Kappa statistic: Kappa is negative, indicating that the model has
poor predictive power.
Sensitivity and specificity: the sensitivity was high (96.6%),
indicating that the model was able to identify individuals with
non-depressive symptoms well; however, the specificity was extremely low
(2.7%), indicating that it was almost impossible to correctly identify
individuals with true depressive symptoms.
Positive and negative predictive values: the positive predictive
value is 84.68%, but the negative predictive value is 12.5%, indicating
the model’s poor ability to predict class ‘1’.
5.2 Balanced dataset
5.2.1 Logistic model on balanced dataset (Upsampling)
## [1] "Number of rows: 1147"
## [1] "Number of columns: 35"
## [1] "Number of missing values: 0"
## Confusion Matrix and Statistics
##
## Reference
## Prediction 0 1
## 0 142 86
## 1 64 120
##
## Accuracy : 0.6359
## 95% CI : (0.5874, 0.6825)
## No Information Rate : 0.5
## P-Value [Acc > NIR] : 1.884e-08
##
## Kappa : 0.2718
##
## Mcnemar's Test P-Value : 0.08641
##
## Sensitivity : 0.6893
## Specificity : 0.5825
## Pos Pred Value : 0.6228
## Neg Pred Value : 0.6522
## Prevalence : 0.5000
## Detection Rate : 0.3447
## Detection Prevalence : 0.5534
## Balanced Accuracy : 0.6359
##
## 'Positive' Class : 0
##
5.2.1 Summary of Logistic model on balanced dataset
Summarizing the model performance: The overall performance of the
model was good with an accuracy of 63.59% . But the predictive ability
is relatively relatively good in unbalanced data. But the sensitivity
and specificity performance is not good.
No Information Rate (NIR): 0.5, indicating balanced data
categories.
P-Value [Acc > NIR]: 1.884e-08. This very small p-value indicates
that the model is significantly more accurate than the prediction rate
without any information.
Kappa: 0.2718, indicating that the model has some predictive
power.
Mcnemar’s Test P-Value: 0.08641. This value is greater than 0.05,
indicating that there is no significant bias between predicting positive
and negative categories.
Sensitivity: 68.93%. This means that the model correctly identifies
68.93% of the actual positive classes, indicating that the model
performs well in identifying actual positive classes.
Specificity: 58.25%. This means that the model correctly identifies
58.25% of the actual negative classes, indicating that the model has
improved its performance relative to the previous model.
Pos Pred Value, PPV: 62.28%. This is the percentage of predicted
positive categories that are actually positive, indicating that when the
model predicts a sample to be positive, there is a 62.28% probability
that it will be correct.
Neg Pred Value, NPV: 65.22%. This is the percentage of predicted
negative categories that are actually negative, indicating that when the
model predicts a sample to be negative, there is a 65.22% probability of
being correct.
Balanced Accuracy: 63.59%. This is the average of the sensitivity
and specificity. It indicates that the model is equally capable of
predicting both categories.
5.2.2 Decision Tree model on balanced dataset (Upsampling)
## [1] "Number of rows: 1147"
## [1] "Number of columns: 35"
## [1] "Number of missing values: 0"

## Call:
## rpart(formula = depressed ~ ., data = train_data, method = "class")
## n= 964
##
## CP nsplit rel error xerror xstd
## 1 0.15560166 0 1.0000000 1.0788382 0.03210758
## 2 0.10788382 1 0.8443983 0.9107884 0.03207941
## 3 0.03941909 2 0.7365145 0.7842324 0.03144917
## 4 0.01867220 3 0.6970954 0.7406639 0.03110591
## 5 0.01659751 5 0.6597510 0.7385892 0.03108789
## 6 0.01556017 8 0.6058091 0.6867220 0.03058654
## 7 0.01452282 15 0.4854772 0.6867220 0.03058654
## 8 0.01244813 16 0.4709544 0.6473029 0.03013808
## 9 0.01175657 17 0.4585062 0.6307054 0.02993114
## 10 0.01000000 20 0.4232365 0.6078838 0.02962849
##
## Variable importance
## cons_social cons_nondurable cons_other
## 10 8 7
## asset_durable cons_allfood cons_med_children
## 6 6 6
## durable_investment fs_chwholed_often med_sickdays_hhave
## 5 5 5
## fs_meat children asset_phone
## 5 4 4
## fs_adwholed_often household_size ent_total_cost
## 4 4 3
## marital_status age nondurable_investment
## 3 3 3
## fs_sleephun years_of_edu cons_alcohol
## 2 2 2
## cons_med_total cons_tobacco asset_savings
## 1 1 1
##
## Node number 1: 964 observations, complexity param=0.1556017
## predicted class=0 expected loss=0.5 P(node) =1
## class counts: 482 482
## probabilities: 0.500 0.500
## left son=2 (755 obs) right son=3 (209 obs)
## Primary splits:
## fs_adwholed_often < 2 to the left, improve=17.18210, (0 missing)
## med_sickdays_hhave < 1.469298 to the left, improve=15.26727, (0 missing)
## cons_social < 3.22643 to the right, improve=14.05538, (0 missing)
## durable_investment < 282.9833 to the right, improve=12.82277, (0 missing)
## years_of_edu < 6.5 to the right, improve=12.75755, (0 missing)
## Surrogate splits:
## fs_sleephun < 0.5 to the left, agree=0.859, adj=0.349, (0 split)
## fs_chwholed_often < 0.9384058 to the left, agree=0.856, adj=0.335, (0 split)
## cons_social < 20.57975 to the left, agree=0.793, adj=0.043, (0 split)
## cons_med_total < 29.62843 to the left, agree=0.789, adj=0.029, (0 split)
## asset_savings < 94.49069 to the left, agree=0.785, adj=0.010, (0 split)
##
## Node number 2: 755 observations, complexity param=0.1078838
## predicted class=0 expected loss=0.4503311 P(node) =0.783195
## class counts: 415 340
## probabilities: 0.550 0.450
## left son=4 (357 obs) right son=5 (398 obs)
## Primary splits:
## cons_social < 0.08007685 to the right, improve=22.26146, (0 missing)
## fs_meat < 3.03444 to the left, improve=21.27456, (0 missing)
## nondurable_investment < 0.3447753 to the right, improve=17.67330, (0 missing)
## ent_total_cost < 0.07785249 to the right, improve=16.71426, (0 missing)
## cons_nondurable < 13.62031 to the right, improve=16.65899, (0 missing)
## Surrogate splits:
## asset_durable < 12.0916 to the right, agree=0.959, adj=0.913, (0 split)
## durable_investment < 20.90006 to the right, agree=0.958, adj=0.910, (0 split)
## cons_nondurable < 13.62031 to the right, agree=0.956, adj=0.908, (0 split)
## cons_allfood < 2.616797 to the right, agree=0.956, adj=0.908, (0 split)
## cons_other < 0.4804611 to the right, agree=0.955, adj=0.905, (0 split)
##
## Node number 3: 209 observations, complexity param=0.01659751
## predicted class=1 expected loss=0.3205742 P(node) =0.216805
## class counts: 67 142
## probabilities: 0.321 0.679
## left son=6 (108 obs) right son=7 (101 obs)
## Primary splits:
## med_sickdays_hhave < 1.775 to the left, improve=10.279040, (0 missing)
## durable_investment < 283.7841 to the right, improve= 8.556608, (0 missing)
## cons_other < 44.05828 to the right, improve= 6.367930, (0 missing)
## ent_total_cost < 0.6517366 to the left, improve= 5.469878, (0 missing)
## years_of_edu < 8.5 to the right, improve= 4.869325, (0 missing)
## Surrogate splits:
## cons_social < 2.462363 to the left, agree=0.646, adj=0.267, (0 split)
## cons_med_total < 14.01345 to the left, agree=0.622, adj=0.218, (0 split)
## cons_med_children < 0.8808453 to the left, agree=0.622, adj=0.218, (0 split)
## marital_status < 0.5 to the right, agree=0.617, adj=0.208, (0 split)
## nondurable_investment < 40.25419 to the left, agree=0.612, adj=0.198, (0 split)
##
## Node number 4: 357 observations, complexity param=0.03941909
## predicted class=0 expected loss=0.3221289 P(node) =0.370332
## class counts: 242 115
## probabilities: 0.678 0.322
## left son=8 (284 obs) right son=9 (73 obs)
## Primary splits:
## asset_phone < 63.26071 to the left, improve=17.411140, (0 missing)
## fs_meat < 4.5 to the left, improve=10.299610, (0 missing)
## asset_land_owned_total < 1.3 to the right, improve= 8.888106, (0 missing)
## asset_durable < 523.8868 to the left, improve= 7.820506, (0 missing)
## cons_med_total < 12.33183 to the left, improve= 6.885786, (0 missing)
## Surrogate splits:
## cons_med_children < 10.57014 to the left, agree=0.821, adj=0.123, (0 split)
## asset_durable < 584.561 to the left, agree=0.812, adj=0.082, (0 split)
## nondurable_investment < 0.3447753 to the right, agree=0.812, adj=0.082, (0 split)
## cons_nondurable < 423.3387 to the left, agree=0.810, adj=0.068, (0 split)
## cons_other < 77.35424 to the left, agree=0.810, adj=0.068, (0 split)
##
## Node number 5: 398 observations, complexity param=0.01556017
## predicted class=1 expected loss=0.4346734 P(node) =0.4128631
## class counts: 173 225
## probabilities: 0.435 0.565
## left son=10 (312 obs) right son=11 (86 obs)
## Primary splits:
## fs_chwholed_often < 0.112069 to the right, improve=8.963033, (0 missing)
## children < 0.5 to the right, improve=7.398128, (0 missing)
## cons_med_children < 0.6780793 to the right, improve=7.359677, (0 missing)
## marital_status < 0.5 to the right, improve=7.297268, (0 missing)
## med_sickdays_hhave < 3.75 to the left, improve=5.508631, (0 missing)
## Surrogate splits:
## cons_med_children < 0.1501441 to the right, agree=0.987, adj=0.942, (0 split)
## children < 0.5 to the right, agree=0.925, adj=0.651, (0 split)
## household_size < 2.5 to the right, agree=0.894, adj=0.512, (0 split)
## cons_nondurable < 14.34405 to the left, agree=0.862, adj=0.360, (0 split)
## asset_durable < 2.882766 to the left, agree=0.862, adj=0.360, (0 split)
##
## Node number 6: 108 observations, complexity param=0.01659751
## predicted class=1 expected loss=0.4722222 P(node) =0.1120332
## class counts: 51 57
## probabilities: 0.472 0.528
## left son=12 (16 obs) right son=13 (92 obs)
## Primary splits:
## cons_social < 4.170669 to the right, improve=10.463770, (0 missing)
## marital_status < 0.5 to the left, improve= 8.233333, (0 missing)
## ent_total_cost < 0.8713918 to the left, improve= 7.724959, (0 missing)
## fs_meat < 3.5 to the right, improve= 6.699595, (0 missing)
## asset_durable < 61.01856 to the right, improve= 6.519048, (0 missing)
## Surrogate splits:
## cons_med_children < 2.001921 to the right, agree=0.880, adj=0.188, (0 split)
## cons_other < 44.37859 to the right, agree=0.880, adj=0.188, (0 split)
## med_sickdays_hhave < 1.669643 to the right, agree=0.880, adj=0.188, (0 split)
## cons_med_total < 2.322229 to the right, agree=0.870, adj=0.125, (0 split)
## years_of_edu < 11 to the right, agree=0.861, adj=0.063, (0 split)
##
## Node number 7: 101 observations
## predicted class=1 expected loss=0.1584158 P(node) =0.1047718
## class counts: 16 85
## probabilities: 0.158 0.842
##
## Node number 8: 284 observations, complexity param=0.01175657
## predicted class=0 expected loss=0.2429577 P(node) =0.2946058
## class counts: 215 69
## probabilities: 0.757 0.243
## left son=16 (117 obs) right son=17 (167 obs)
## Primary splits:
## ent_total_cost < 3.359891 to the left, improve=8.827632, (0 missing)
## nondurable_investment < 3.705779 to the left, improve=8.730911, (0 missing)
## ent_nonag_revenue < 333.1197 to the left, improve=5.414585, (0 missing)
## ed_expenses < 15.61499 to the left, improve=5.241596, (0 missing)
## years_of_edu < 5.5 to the right, improve=5.046073, (0 missing)
## Surrogate splits:
## nondurable_investment < 7.223599 to the left, agree=0.866, adj=0.675, (0 split)
## cons_nondurable < 129.3255 to the left, agree=0.683, adj=0.231, (0 split)
## cons_other < 13.70115 to the left, agree=0.662, adj=0.179, (0 split)
## durable_investment < 140.7668 to the left, agree=0.651, adj=0.154, (0 split)
## cons_allfood < 85.47331 to the left, agree=0.648, adj=0.145, (0 split)
##
## Node number 9: 73 observations, complexity param=0.0186722
## predicted class=1 expected loss=0.369863 P(node) =0.07572614
## class counts: 27 46
## probabilities: 0.370 0.630
## left son=18 (29 obs) right son=19 (44 obs)
## Primary splits:
## fs_meat < 3.5 to the left, improve=7.833040, (0 missing)
## cons_social < 3.269805 to the right, improve=6.299506, (0 missing)
## fs_sleephun < 0.5 to the right, improve=4.150348, (0 missing)
## age < 29.5 to the right, improve=4.109722, (0 missing)
## years_of_edu < 9.5 to the left, improve=3.661097, (0 missing)
## Surrogate splits:
## cons_other < 24.34336 to the left, agree=0.836, adj=0.586, (0 split)
## fs_sleephun < 0.5 to the right, agree=0.767, adj=0.414, (0 split)
## cons_allfood < 137.5272 to the left, agree=0.753, adj=0.379, (0 split)
## cons_nondurable < 92.18123 to the left, agree=0.740, adj=0.345, (0 split)
## asset_savings < 3.203074 to the left, agree=0.726, adj=0.310, (0 split)
##
## Node number 10: 312 observations, complexity param=0.01556017
## predicted class=1 expected loss=0.4903846 P(node) =0.3236515
## class counts: 153 159
## probabilities: 0.490 0.510
## left son=20 (10 obs) right son=21 (302 obs)
## Primary splits:
## household_size < 2.5 to the left, improve=5.366149, (0 missing)
## children < 1.5 to the left, improve=4.720085, (0 missing)
## age < 22.5 to the left, improve=3.436915, (0 missing)
## marital_status < 0.5 to the right, improve=2.763880, (0 missing)
## cons_med_children < 1.525836 to the right, improve=2.550214, (0 missing)
##
## Node number 11: 86 observations
## predicted class=1 expected loss=0.2325581 P(node) =0.08921162
## class counts: 20 66
## probabilities: 0.233 0.767
##
## Node number 12: 16 observations
## predicted class=0 expected loss=0 P(node) =0.01659751
## class counts: 16 0
## probabilities: 1.000 0.000
##
## Node number 13: 92 observations, complexity param=0.01659751
## predicted class=1 expected loss=0.3804348 P(node) =0.09543568
## class counts: 35 57
## probabilities: 0.380 0.620
## left son=26 (10 obs) right son=27 (82 obs)
## Primary splits:
## marital_status < 0.5 to the left, improve=8.613468, (0 missing)
## cons_allfood < 65.76883 to the left, improve=7.987229, (0 missing)
## ent_total_cost < 0.8713918 to the left, improve=6.309825, (0 missing)
## age < 50 to the right, improve=5.816624, (0 missing)
## nondurable_investment < 1.782266 to the left, improve=5.816624, (0 missing)
## Surrogate splits:
## age < 53 to the right, agree=0.935, adj=0.4, (0 split)
## asset_land_owned_total < 2.495 to the right, agree=0.913, adj=0.2, (0 split)
##
## Node number 16: 117 observations
## predicted class=0 expected loss=0.09401709 P(node) =0.1213693
## class counts: 106 11
## probabilities: 0.906 0.094
##
## Node number 17: 167 observations, complexity param=0.01175657
## predicted class=0 expected loss=0.3473054 P(node) =0.1732365
## class counts: 109 58
## probabilities: 0.653 0.347
## left son=34 (125 obs) right son=35 (42 obs)
## Primary splits:
## years_of_edu < 7.5 to the right, improve=6.898480, (0 missing)
## cons_nondurable < 53.34834 to the right, improve=5.580367, (0 missing)
## durable_investment < 191.5102 to the right, improve=4.503051, (0 missing)
## ent_total_cost < 16.04206 to the right, improve=4.358415, (0 missing)
## asset_phone < 41.63996 to the right, improve=4.268866, (0 missing)
## Surrogate splits:
## cons_nondurable < 66.13776 to the right, agree=0.802, adj=0.214, (0 split)
## cons_allfood < 38.47273 to the right, agree=0.802, adj=0.214, (0 split)
## age < 42 to the left, agree=0.796, adj=0.190, (0 split)
## cons_tobacco < 1.014088 to the left, agree=0.790, adj=0.167, (0 split)
## durable_investment < 100.6285 to the right, agree=0.784, adj=0.143, (0 split)
##
## Node number 18: 29 observations, complexity param=0.0186722
## predicted class=0 expected loss=0.3448276 P(node) =0.03008299
## class counts: 19 10
## probabilities: 0.655 0.345
## left son=36 (18 obs) right son=37 (11 obs)
## Primary splits:
## fs_meat < 1.5 to the right, improve=11.285270, (0 missing)
## cons_social < 2.602498 to the right, improve= 6.436782, (0 missing)
## cons_other < 20.54772 to the right, improve= 5.603448, (0 missing)
## ed_expenses < 20.65983 to the left, improve= 4.214559, (0 missing)
## asset_durable < 251.7616 to the right, improve= 3.629764, (0 missing)
## Surrogate splits:
## cons_social < 2.602498 to the right, agree=0.862, adj=0.636, (0 split)
## cons_nondurable < 84.25133 to the right, agree=0.759, adj=0.364, (0 split)
## asset_durable < 153.0269 to the right, agree=0.759, adj=0.364, (0 split)
## cons_other < 20.54772 to the right, agree=0.759, adj=0.364, (0 split)
## med_sickdays_hhave < 3.925 to the left, agree=0.759, adj=0.364, (0 split)
##
## Node number 19: 44 observations
## predicted class=1 expected loss=0.1818182 P(node) =0.04564315
## class counts: 8 36
## probabilities: 0.182 0.818
##
## Node number 20: 10 observations
## predicted class=0 expected loss=0 P(node) =0.01037344
## class counts: 10 0
## probabilities: 1.000 0.000
##
## Node number 21: 302 observations, complexity param=0.01556017
## predicted class=1 expected loss=0.4735099 P(node) =0.313278
## class counts: 143 159
## probabilities: 0.474 0.526
## left son=42 (244 obs) right son=43 (58 obs)
## Primary splits:
## marital_status < 0.5 to the right, improve=4.672824, (0 missing)
## age < 22.5 to the left, improve=2.490137, (0 missing)
## children < 6.5 to the left, improve=2.361525, (0 missing)
## fs_chwholed_often < 0.4395492 to the right, improve=1.611502, (0 missing)
## years_of_edu < 7.5 to the right, improve=1.274011, (0 missing)
## Surrogate splits:
## years_of_edu < 5.5 to the right, agree=0.821, adj=0.069, (0 split)
##
## Node number 26: 10 observations
## predicted class=0 expected loss=0 P(node) =0.01037344
## class counts: 10 0
## probabilities: 1.000 0.000
##
## Node number 27: 82 observations, complexity param=0.01452282
## predicted class=1 expected loss=0.304878 P(node) =0.08506224
## class counts: 25 57
## probabilities: 0.305 0.695
## left son=54 (13 obs) right son=55 (69 obs)
## Primary splits:
## ent_total_cost < 0.8713918 to the left, improve=6.662452, (0 missing)
## cons_allfood < 65.76883 to the left, improve=5.572647, (0 missing)
## cons_social < 1.000961 to the right, improve=4.943337, (0 missing)
## cons_nondurable < 73.02399 to the left, improve=3.715162, (0 missing)
## asset_durable < 61.01856 to the right, improve=3.637501, (0 missing)
## Surrogate splits:
## nondurable_investment < 2.008594 to the left, agree=0.927, adj=0.538, (0 split)
## cons_nondurable < 41.84797 to the left, agree=0.890, adj=0.308, (0 split)
## fs_meat < 0.5 to the left, agree=0.878, adj=0.231, (0 split)
##
## Node number 34: 125 observations
## predicted class=0 expected loss=0.264 P(node) =0.129668
## class counts: 92 33
## probabilities: 0.736 0.264
##
## Node number 35: 42 observations, complexity param=0.01175657
## predicted class=1 expected loss=0.4047619 P(node) =0.04356846
## class counts: 17 25
## probabilities: 0.405 0.595
## left son=70 (9 obs) right son=71 (33 obs)
## Primary splits:
## cons_alcohol < 0.587257 to the right, improve=8.116883, (0 missing)
## ent_total_cost < 10.20757 to the right, improve=5.418873, (0 missing)
## cons_ed < 2.335575 to the left, improve=4.132326, (0 missing)
## cons_tobacco < 0.6978126 to the right, improve=4.004762, (0 missing)
## ed_expenses < 31.87059 to the left, improve=3.569903, (0 missing)
## Surrogate splits:
## cons_tobacco < 0.6978126 to the right, agree=0.833, adj=0.222, (0 split)
## cons_ed < 0.6673071 to the left, agree=0.833, adj=0.222, (0 split)
## med_sickdays_hhave < 4.45 to the right, agree=0.833, adj=0.222, (0 split)
## ed_expenses < 8.007685 to the left, agree=0.833, adj=0.222, (0 split)
## durable_investment < 84.29773 to the left, agree=0.833, adj=0.222, (0 split)
##
## Node number 36: 18 observations
## predicted class=0 expected loss=0 P(node) =0.0186722
## class counts: 18 0
## probabilities: 1.000 0.000
##
## Node number 37: 11 observations
## predicted class=1 expected loss=0.09090909 P(node) =0.01141079
## class counts: 1 10
## probabilities: 0.091 0.909
##
## Node number 42: 244 observations, complexity param=0.01556017
## predicted class=0 expected loss=0.4836066 P(node) =0.253112
## class counts: 126 118
## probabilities: 0.516 0.484
## left son=84 (212 obs) right son=85 (32 obs)
## Primary splits:
## children < 5.5 to the left, improve=3.062249, (0 missing)
## age < 24.5 to the left, improve=2.760817, (0 missing)
## fs_chwholed_often < 0.4395492 to the right, improve=2.287265, (0 missing)
## years_of_edu < 4.5 to the left, improve=2.127327, (0 missing)
## cons_med_children < 2.100913 to the right, improve=1.663826, (0 missing)
## Surrogate splits:
## cons_med_children < 1.136585 to the right, agree=0.963, adj=0.719, (0 split)
## household_size < 7.5 to the left, agree=0.947, adj=0.594, (0 split)
## med_sickdays_hhave < 1.295687 to the right, agree=0.947, adj=0.594, (0 split)
## fs_chwholed_often < 0.5664414 to the left, agree=0.881, adj=0.094, (0 split)
##
## Node number 43: 58 observations
## predicted class=1 expected loss=0.2931034 P(node) =0.06016598
## class counts: 17 41
## probabilities: 0.293 0.707
##
## Node number 54: 13 observations
## predicted class=0 expected loss=0.2307692 P(node) =0.01348548
## class counts: 10 3
## probabilities: 0.769 0.231
##
## Node number 55: 69 observations
## predicted class=1 expected loss=0.2173913 P(node) =0.07157676
## class counts: 15 54
## probabilities: 0.217 0.783
##
## Node number 70: 9 observations
## predicted class=0 expected loss=0 P(node) =0.0093361
## class counts: 9 0
## probabilities: 1.000 0.000
##
## Node number 71: 33 observations
## predicted class=1 expected loss=0.2424242 P(node) =0.03423237
## class counts: 8 25
## probabilities: 0.242 0.758
##
## Node number 84: 212 observations, complexity param=0.01556017
## predicted class=0 expected loss=0.4528302 P(node) =0.219917
## class counts: 116 96
## probabilities: 0.547 0.453
## left son=168 (56 obs) right son=169 (156 obs)
## Primary splits:
## household_size < 5.5 to the right, improve=3.390853, (0 missing)
## fs_chwholed_often < 0.4288448 to the right, improve=3.227228, (0 missing)
## years_of_edu < 13.5 to the right, improve=2.968799, (0 missing)
## med_sickdays_hhave < 1.525668 to the left, improve=2.657614, (0 missing)
## age < 31.5 to the right, improve=2.086907, (0 missing)
## Surrogate splits:
## children < 3.5 to the right, agree=0.934, adj=0.750, (0 split)
## med_sickdays_hhave < 1.525668 to the left, agree=0.906, adj=0.643, (0 split)
## fs_chwholed_often < 0.4288448 to the right, agree=0.877, adj=0.536, (0 split)
## cons_med_children < 1.238365 to the left, agree=0.858, adj=0.464, (0 split)
## age < 38.5 to the right, agree=0.811, adj=0.286, (0 split)
##
## Node number 85: 32 observations
## predicted class=1 expected loss=0.3125 P(node) =0.03319502
## class counts: 10 22
## probabilities: 0.312 0.688
##
## Node number 168: 56 observations
## predicted class=0 expected loss=0.3035714 P(node) =0.05809129
## class counts: 39 17
## probabilities: 0.696 0.304
##
## Node number 169: 156 observations, complexity param=0.01556017
## predicted class=1 expected loss=0.4935897 P(node) =0.1618257
## class counts: 77 79
## probabilities: 0.494 0.506
## left son=338 (69 obs) right son=339 (87 obs)
## Primary splits:
## age < 24.5 to the left, improve=4.1560950, (0 missing)
## children < 1.5 to the left, improve=3.0010680, (0 missing)
## cons_med_children < 2.100913 to the right, improve=3.0010680, (0 missing)
## fs_chwholed_often < 0.526824 to the right, improve=3.0010680, (0 missing)
## years_of_edu < 11.5 to the left, improve=0.8312297, (0 missing)
## Surrogate splits:
## household_size < 4.5 to the left, agree=0.628, adj=0.159, (0 split)
## med_sickdays_hhave < 1.6844 to the right, agree=0.628, adj=0.159, (0 split)
## years_of_edu < 8.5 to the left, agree=0.622, adj=0.145, (0 split)
## children < 3.5 to the right, agree=0.583, adj=0.058, (0 split)
## cons_med_children < 1.238365 to the left, agree=0.583, adj=0.058, (0 split)
##
## Node number 338: 69 observations, complexity param=0.01244813
## predicted class=0 expected loss=0.3768116 P(node) =0.07157676
## class counts: 43 26
## probabilities: 0.623 0.377
## left son=676 (55 obs) right son=677 (14 obs)
## Primary splits:
## age < 17.5 to the right, improve=4.0006020, (0 missing)
## years_of_edu < 8.5 to the right, improve=2.4492750, (0 missing)
## med_sickdays_hhave < 1.6844 to the left, improve=1.7536230, (0 missing)
## household_size < 4.5 to the right, improve=1.7536230, (0 missing)
## fs_chwholed_often < 0.4288448 to the left, improve=0.7443551, (0 missing)
## Surrogate splits:
## children < 3.5 to the left, agree=0.855, adj=0.286, (0 split)
## cons_med_children < 1.238365 to the right, agree=0.855, adj=0.286, (0 split)
## household_size < 3.5 to the right, agree=0.826, adj=0.143, (0 split)
## med_sickdays_hhave < 2.125778 to the left, agree=0.826, adj=0.143, (0 split)
##
## Node number 339: 87 observations, complexity param=0.01556017
## predicted class=1 expected loss=0.3908046 P(node) =0.09024896
## class counts: 34 53
## probabilities: 0.391 0.609
## left son=678 (9 obs) right son=679 (78 obs)
## Primary splits:
## fs_chwholed_often < 0.4288448 to the right, improve=7.450928, (0 missing)
## children < 1.5 to the left, improve=5.650287, (0 missing)
## cons_med_children < 2.100913 to the right, improve=5.650287, (0 missing)
## age < 32 to the right, improve=3.321839, (0 missing)
## years_of_edu < 12.5 to the right, improve=1.088692, (0 missing)
## Surrogate splits:
## children < 1.5 to the left, agree=0.977, adj=0.778, (0 split)
## cons_med_children < 2.100913 to the right, agree=0.977, adj=0.778, (0 split)
##
## Node number 676: 55 observations
## predicted class=0 expected loss=0.2909091 P(node) =0.05705394
## class counts: 39 16
## probabilities: 0.709 0.291
##
## Node number 677: 14 observations
## predicted class=1 expected loss=0.2857143 P(node) =0.01452282
## class counts: 4 10
## probabilities: 0.286 0.714
##
## Node number 678: 9 observations
## predicted class=0 expected loss=0 P(node) =0.0093361
## class counts: 9 0
## probabilities: 1.000 0.000
##
## Node number 679: 78 observations
## predicted class=1 expected loss=0.3205128 P(node) =0.08091286
## class counts: 25 53
## probabilities: 0.321 0.679
##
## n= 964
##
## node), split, n, loss, yval, (yprob)
## * denotes terminal node
##
## 1) root 964 482 0 (0.50000000 0.50000000)
## 2) fs_adwholed_often< 2 755 340 0 (0.54966887 0.45033113)
## 4) cons_social>=0.08007685 357 115 0 (0.67787115 0.32212885)
## 8) asset_phone< 63.26071 284 69 0 (0.75704225 0.24295775)
## 16) ent_total_cost< 3.359891 117 11 0 (0.90598291 0.09401709) *
## 17) ent_total_cost>=3.359891 167 58 0 (0.65269461 0.34730539)
## 34) years_of_edu>=7.5 125 33 0 (0.73600000 0.26400000) *
## 35) years_of_edu< 7.5 42 17 1 (0.40476190 0.59523810)
## 70) cons_alcohol>=0.587257 9 0 0 (1.00000000 0.00000000) *
## 71) cons_alcohol< 0.587257 33 8 1 (0.24242424 0.75757576) *
## 9) asset_phone>=63.26071 73 27 1 (0.36986301 0.63013699)
## 18) fs_meat< 3.5 29 10 0 (0.65517241 0.34482759)
## 36) fs_meat>=1.5 18 0 0 (1.00000000 0.00000000) *
## 37) fs_meat< 1.5 11 1 1 (0.09090909 0.90909091) *
## 19) fs_meat>=3.5 44 8 1 (0.18181818 0.81818182) *
## 5) cons_social< 0.08007685 398 173 1 (0.43467337 0.56532663)
## 10) fs_chwholed_often>=0.112069 312 153 1 (0.49038462 0.50961538)
## 20) household_size< 2.5 10 0 0 (1.00000000 0.00000000) *
## 21) household_size>=2.5 302 143 1 (0.47350993 0.52649007)
## 42) marital_status>=0.5 244 118 0 (0.51639344 0.48360656)
## 84) children< 5.5 212 96 0 (0.54716981 0.45283019)
## 168) household_size>=5.5 56 17 0 (0.69642857 0.30357143) *
## 169) household_size< 5.5 156 77 1 (0.49358974 0.50641026)
## 338) age< 24.5 69 26 0 (0.62318841 0.37681159)
## 676) age>=17.5 55 16 0 (0.70909091 0.29090909) *
## 677) age< 17.5 14 4 1 (0.28571429 0.71428571) *
## 339) age>=24.5 87 34 1 (0.39080460 0.60919540)
## 678) fs_chwholed_often>=0.4288448 9 0 0 (1.00000000 0.00000000) *
## 679) fs_chwholed_often< 0.4288448 78 25 1 (0.32051282 0.67948718) *
## 85) children>=5.5 32 10 1 (0.31250000 0.68750000) *
## 43) marital_status< 0.5 58 17 1 (0.29310345 0.70689655) *
## 11) fs_chwholed_often< 0.112069 86 20 1 (0.23255814 0.76744186) *
## 3) fs_adwholed_often>=2 209 67 1 (0.32057416 0.67942584)
## 6) med_sickdays_hhave< 1.775 108 51 1 (0.47222222 0.52777778)
## 12) cons_social>=4.170669 16 0 0 (1.00000000 0.00000000) *
## 13) cons_social< 4.170669 92 35 1 (0.38043478 0.61956522)
## 26) marital_status< 0.5 10 0 0 (1.00000000 0.00000000) *
## 27) marital_status>=0.5 82 25 1 (0.30487805 0.69512195)
## 54) ent_total_cost< 0.8713918 13 3 0 (0.76923077 0.23076923) *
## 55) ent_total_cost>=0.8713918 69 15 1 (0.21739130 0.78260870) *
## 7) med_sickdays_hhave>=1.775 101 16 1 (0.15841584 0.84158416) *
## Confusion Matrix and Statistics
##
## Reference
## Prediction 0 1
## 0 138 52
## 1 68 154
##
## Accuracy : 0.7087
## 95% CI : (0.6623, 0.7522)
## No Information Rate : 0.5
## P-Value [Acc > NIR] : <2e-16
##
## Kappa : 0.4175
##
## Mcnemar's Test P-Value : 0.1709
##
## Sensitivity : 0.6699
## Specificity : 0.7476
## Pos Pred Value : 0.7263
## Neg Pred Value : 0.6937
## Prevalence : 0.5000
## Detection Rate : 0.3350
## Detection Prevalence : 0.4612
## Balanced Accuracy : 0.7087
##
## 'Positive' Class : 0
##
## $model
## n= 964
##
## node), split, n, loss, yval, (yprob)
## * denotes terminal node
##
## 1) root 964 482 0 (0.50000000 0.50000000)
## 2) fs_adwholed_often< 2 755 340 0 (0.54966887 0.45033113)
## 4) cons_social>=0.08007685 357 115 0 (0.67787115 0.32212885)
## 8) asset_phone< 63.26071 284 69 0 (0.75704225 0.24295775)
## 16) ent_total_cost< 3.359891 117 11 0 (0.90598291 0.09401709) *
## 17) ent_total_cost>=3.359891 167 58 0 (0.65269461 0.34730539)
## 34) years_of_edu>=7.5 125 33 0 (0.73600000 0.26400000) *
## 35) years_of_edu< 7.5 42 17 1 (0.40476190 0.59523810)
## 70) cons_alcohol>=0.587257 9 0 0 (1.00000000 0.00000000) *
## 71) cons_alcohol< 0.587257 33 8 1 (0.24242424 0.75757576) *
## 9) asset_phone>=63.26071 73 27 1 (0.36986301 0.63013699)
## 18) fs_meat< 3.5 29 10 0 (0.65517241 0.34482759)
## 36) fs_meat>=1.5 18 0 0 (1.00000000 0.00000000) *
## 37) fs_meat< 1.5 11 1 1 (0.09090909 0.90909091) *
## 19) fs_meat>=3.5 44 8 1 (0.18181818 0.81818182) *
## 5) cons_social< 0.08007685 398 173 1 (0.43467337 0.56532663)
## 10) fs_chwholed_often>=0.112069 312 153 1 (0.49038462 0.50961538)
## 20) household_size< 2.5 10 0 0 (1.00000000 0.00000000) *
## 21) household_size>=2.5 302 143 1 (0.47350993 0.52649007)
## 42) marital_status>=0.5 244 118 0 (0.51639344 0.48360656)
## 84) children< 5.5 212 96 0 (0.54716981 0.45283019)
## 168) household_size>=5.5 56 17 0 (0.69642857 0.30357143) *
## 169) household_size< 5.5 156 77 1 (0.49358974 0.50641026)
## 338) age< 24.5 69 26 0 (0.62318841 0.37681159)
## 676) age>=17.5 55 16 0 (0.70909091 0.29090909) *
## 677) age< 17.5 14 4 1 (0.28571429 0.71428571) *
## 339) age>=24.5 87 34 1 (0.39080460 0.60919540)
## 678) fs_chwholed_often>=0.4288448 9 0 0 (1.00000000 0.00000000) *
## 679) fs_chwholed_often< 0.4288448 78 25 1 (0.32051282 0.67948718) *
## 85) children>=5.5 32 10 1 (0.31250000 0.68750000) *
## 43) marital_status< 0.5 58 17 1 (0.29310345 0.70689655) *
## 11) fs_chwholed_often< 0.112069 86 20 1 (0.23255814 0.76744186) *
## 3) fs_adwholed_often>=2 209 67 1 (0.32057416 0.67942584)
## 6) med_sickdays_hhave< 1.775 108 51 1 (0.47222222 0.52777778)
## 12) cons_social>=4.170669 16 0 0 (1.00000000 0.00000000) *
## 13) cons_social< 4.170669 92 35 1 (0.38043478 0.61956522)
## 26) marital_status< 0.5 10 0 0 (1.00000000 0.00000000) *
## 27) marital_status>=0.5 82 25 1 (0.30487805 0.69512195)
## 54) ent_total_cost< 0.8713918 13 3 0 (0.76923077 0.23076923) *
## 55) ent_total_cost>=0.8713918 69 15 1 (0.21739130 0.78260870) *
## 7) med_sickdays_hhave>=1.775 101 16 1 (0.15841584 0.84158416) *
##
## $confusion_matrix
## Confusion Matrix and Statistics
##
## Reference
## Prediction 0 1
## 0 138 52
## 1 68 154
##
## Accuracy : 0.7087
## 95% CI : (0.6623, 0.7522)
## No Information Rate : 0.5
## P-Value [Acc > NIR] : <2e-16
##
## Kappa : 0.4175
##
## Mcnemar's Test P-Value : 0.1709
##
## Sensitivity : 0.6699
## Specificity : 0.7476
## Pos Pred Value : 0.7263
## Neg Pred Value : 0.6937
## Prevalence : 0.5000
## Detection Rate : 0.3350
## Detection Prevalence : 0.4612
## Balanced Accuracy : 0.7087
##
## 'Positive' Class : 0
##
5.2.2 Summary of Decision Tree model on balanced dataset
(Upsampling)
Summarize the model performance: The results show an accuracy of
70.87%, which indicates that the model performs well in distinguishing
between the two categories (depressed and non-depressed).The Kappa
statistic is 0.4175, which indicates that the model’s predictive power
is relatively good. The model showed some validity when dealing with a
balanced dataset.
kappa: 0.4175. kappa values between 0.4 and 0.6 indicate that the
model has moderate predictive consistency.
Mcnemar’s Test P-Value: 0.1709, which is higher than 0.05,
indicating that the difference between the predictions of the positive
and negative categories is not statistically significant, and the model
is more balanced in predicting the two categories.
Sensitivity: 66.99%. Indicates that the model correctly identifies
approximately 67% of non-depressed instances.
Specificity: 74.76%. Indicates that the model correctly identifies
approximately 75% of the instances of depression.
Balanced Accuracy: 70.87%. Indicates that the model has excellent
performance in handling both categories.
Variable Importance and Split of the Model
Variable significance: consumption-related characteristics such as
“cons_social”, “cons_nondurable”, and “cons_other” had a significant
effect on modeled decisions that suggesting a strong association between
economic activity and depressive state.
Main splits of the decision tree: The model is first split based on
“fs_adwholed_often” (Frequency of purchasing full-price food items on a
regular basis), which suggests that household food status is an
important factor influencing depressive status. Next, health and social
factors such as “cons_social” and “med_sickdays_have” are also used as
decision nodes.
5.3.1 Logistic model on balanced dataset (down sampling)
## [1] "Number of rows: 1147"
## [1] "Number of columns: 35"
## [1] "Number of missing values: 0"
## Confusion Matrix and Statistics
##
## Reference
## Prediction 0 1
## 0 20 15
## 1 17 22
##
## Accuracy : 0.5676
## 95% CI : (0.4472, 0.6823)
## No Information Rate : 0.5
## P-Value [Acc > NIR] : 0.1477
##
## Kappa : 0.1351
##
## Mcnemar's Test P-Value : 0.8597
##
## Sensitivity : 0.5405
## Specificity : 0.5946
## Pos Pred Value : 0.5714
## Neg Pred Value : 0.5641
## Prevalence : 0.5000
## Detection Rate : 0.2703
## Detection Prevalence : 0.4730
## Balanced Accuracy : 0.5676
##
## 'Positive' Class : 0
##
5.3.1 Summary of Logistic model on balanced dataset (down
sampling)
Summarizing the model performance
This logistic regression model performed mediocrely when dealing
with a balanced dataset with an accuracy of 56.76%. This indicates that
the model is not very effective in distinguishing between depressed and
non-depressed states.The Kappa statistic of 0.1351 indicates that the
model has average predictive power.
Accuracy: 56.76%. The overall accuracy of the model is low,
indicating its limited discriminatory power.
95% CI (Confidence Interval): (44.72%, 68.23%). Confidence intervals
are wide, indicating that estimates of model accuracy are not stable
enough.
No Information Rate (NIR): 50%. Indicates that if the model does not
have any valid information, the prediction accuracy is 50%.
Kappa: 0.1351. This value indicates that the predictive power of the
model is not good.
Mcnemar’s Test P-Value: 0.8597, which indicates that the bias
between positive and negative predictions is not significant, i.e., the
model’s imbalance between the predictions of the two categories is not
significant.
Sensitivity and Specificity:54.05% and 59.46%. These two indicators
show that the model is weak in recognizing both positive and negative
categories.
Positive Predictive Value, PPV and Negative Predictive Value, NPV:
PPV is 57.14% and NPV is 56.41%, which indicates that the model is
average in predicting correctness.
Balanced Accuracy: 56.76%, which indicates that the model is average
in positive and negative class prediction.
5.3.2 Decision Tree model on balanced dataset (down sampling)
## [1] "Number of rows: 1147"
## [1] "Number of columns: 35"
## [1] "Number of missing values: 0"

## Call:
## rpart(formula = depressed ~ ., data = train_data, method = "class")
## n= 176
##
## CP nsplit rel error xerror xstd
## 1 0.23863636 0 1.0000000 1.2045455 0.07378413
## 2 0.05113636 1 0.7613636 0.7954545 0.07378413
## 3 0.03409091 4 0.6022727 0.8863636 0.07488957
## 4 0.02272727 6 0.5340909 0.9204545 0.07513898
## 5 0.01136364 7 0.5113636 0.9090909 0.07506571
## 6 0.01000000 8 0.5000000 0.9090909 0.07506571
##
## Variable importance
## years_of_edu med_sickdays_hhave ed_expenses
## 14 13 9
## cons_ed cons_social nondurable_investment
## 9 7 6
## age household_size children
## 5 4 4
## cons_nondurable cons_allfood hh_children
## 4 4 4
## durable_investment ent_total_cost cons_med_children
## 3 3 3
## cons_other ed_schoolattend fs_chwholed_often
## 3 3 2
## fs_adwholed_often cons_alcohol
## 1 1
##
## Node number 1: 176 observations, complexity param=0.2386364
## predicted class=0 expected loss=0.5 P(node) =1
## class counts: 88 88
## probabilities: 0.500 0.500
## left son=2 (141 obs) right son=3 (35 obs)
## Primary splits:
## years_of_edu < 6.5 to the right, improve=7.863830, (0 missing)
## med_sickdays_hhave < 0.08333333 to the left, improve=6.731935, (0 missing)
## ed_expenses < 51.20914 to the left, improve=3.927273, (0 missing)
## cons_allfood < 186.4439 to the left, improve=3.880071, (0 missing)
## cons_social < 3.269805 to the right, improve=3.705783, (0 missing)
## Surrogate splits:
## age < 46.5 to the left, agree=0.847, adj=0.229, (0 split)
## household_size < 1.5 to the right, agree=0.841, adj=0.200, (0 split)
## children < 0.5 to the right, agree=0.830, adj=0.143, (0 split)
## med_sickdays_hhave < 6.955645 to the left, agree=0.830, adj=0.143, (0 split)
## fs_adwholed_often < 5.25 to the left, agree=0.812, adj=0.057, (0 split)
##
## Node number 2: 141 observations, complexity param=0.05113636
## predicted class=0 expected loss=0.4255319 P(node) =0.8011364
## class counts: 81 60
## probabilities: 0.574 0.426
## left son=4 (28 obs) right son=5 (113 obs)
## Primary splits:
## med_sickdays_hhave < 0.08333333 to the left, improve=5.583452, (0 missing)
## ed_expenses < 45.48365 to the left, improve=5.579527, (0 missing)
## cons_ed < 3.790304 to the left, improve=4.362527, (0 missing)
## cons_social < 0.1648248 to the right, improve=3.451186, (0 missing)
## asset_land_owned_total < 1.25 to the right, improve=2.760732, (0 missing)
## Surrogate splits:
## cons_alcohol < 4.186875 to the right, agree=0.816, adj=0.071, (0 split)
## cons_med_children < 7.206916 to the right, agree=0.809, adj=0.036, (0 split)
##
## Node number 3: 35 observations
## predicted class=1 expected loss=0.2 P(node) =0.1988636
## class counts: 7 28
## probabilities: 0.200 0.800
##
## Node number 4: 28 observations
## predicted class=0 expected loss=0.1428571 P(node) =0.1590909
## class counts: 24 4
## probabilities: 0.857 0.143
##
## Node number 5: 113 observations, complexity param=0.05113636
## predicted class=0 expected loss=0.4955752 P(node) =0.6420455
## class counts: 57 56
## probabilities: 0.504 0.496
## left son=10 (102 obs) right son=11 (11 obs)
## Primary splits:
## ed_expenses < 45.48365 to the left, improve=4.167589, (0 missing)
## ed_schoolattend < 0.7321429 to the left, improve=3.073525, (0 missing)
## cons_allfood < 150.3553 to the left, improve=3.063232, (0 missing)
## cons_ed < 3.790304 to the left, improve=3.063232, (0 missing)
## cons_nondurable < 155.5233 to the left, improve=2.758684, (0 missing)
## Surrogate splits:
## cons_ed < 3.790304 to the left, agree=0.991, adj=0.909, (0 split)
## cons_social < 10.10303 to the left, agree=0.929, adj=0.273, (0 split)
##
## Node number 10: 102 observations, complexity param=0.05113636
## predicted class=0 expected loss=0.4509804 P(node) =0.5795455
## class counts: 56 46
## probabilities: 0.549 0.451
## left son=20 (33 obs) right son=21 (69 obs)
## Primary splits:
## cons_social < 0.7607301 to the right, improve=3.100054, (0 missing)
## cons_med_children < 2.80633 to the right, improve=1.844910, (0 missing)
## asset_durable < 157.351 to the right, improve=1.593137, (0 missing)
## cons_nondurable < 55.43596 to the right, improve=1.551191, (0 missing)
## cons_allfood < 27.7962 to the right, improve=1.551191, (0 missing)
## Surrogate splits:
## cons_nondurable < 55.43596 to the right, agree=0.902, adj=0.697, (0 split)
## cons_allfood < 27.7962 to the right, agree=0.902, adj=0.697, (0 split)
## ent_total_cost < 0.03336535 to the right, agree=0.892, adj=0.667, (0 split)
## durable_investment < 42.50397 to the right, agree=0.892, adj=0.667, (0 split)
## nondurable_investment < 0.7240282 to the right, agree=0.892, adj=0.667, (0 split)
##
## Node number 11: 11 observations
## predicted class=1 expected loss=0.09090909 P(node) =0.0625
## class counts: 1 10
## probabilities: 0.091 0.909
##
## Node number 20: 33 observations
## predicted class=0 expected loss=0.2727273 P(node) =0.1875
## class counts: 24 9
## probabilities: 0.727 0.273
##
## Node number 21: 69 observations, complexity param=0.03409091
## predicted class=1 expected loss=0.4637681 P(node) =0.3920455
## class counts: 32 37
## probabilities: 0.464 0.536
## left son=42 (61 obs) right son=43 (8 obs)
## Primary splits:
## hh_children < 2.5 to the left, improve=2.077037, (0 missing)
## cons_ed < 0.3002882 to the left, improve=1.627315, (0 missing)
## fs_chwholed_often < 0.3434622 to the right, improve=1.620659, (0 missing)
## cons_med_children < 0.6880889 to the right, improve=1.489211, (0 missing)
## years_of_edu < 10.5 to the right, improve=1.196034, (0 missing)
## Surrogate splits:
## cons_ed < 0.3002882 to the left, agree=0.971, adj=0.750, (0 split)
## cons_other < 20.05925 to the left, agree=0.971, adj=0.750, (0 split)
## ed_schoolattend < 0.25 to the left, agree=0.971, adj=0.750, (0 split)
## ed_expenses < 3.603458 to the left, agree=0.957, adj=0.625, (0 split)
## nondurable_investment < 0.854153 to the left, agree=0.957, adj=0.625, (0 split)
##
## Node number 42: 61 observations, complexity param=0.03409091
## predicted class=0 expected loss=0.4918033 P(node) =0.3465909
## class counts: 31 30
## probabilities: 0.508 0.492
## left son=84 (40 obs) right son=85 (21 obs)
## Primary splits:
## age < 26.5 to the right, improve=1.0370410, (0 missing)
## household_size < 4.5 to the right, improve=1.0089660, (0 missing)
## med_sickdays_hhave < 1.6844 to the left, improve=0.9841780, (0 missing)
## years_of_edu < 10.5 to the right, improve=0.7503067, (0 missing)
## marital_status < 0.5 to the right, improve=0.5608942, (0 missing)
## Surrogate splits:
## household_size < 4.5 to the right, agree=0.738, adj=0.238, (0 split)
## children < 2.5 to the right, agree=0.721, adj=0.190, (0 split)
## fs_chwholed_often < 0.2787356 to the right, agree=0.721, adj=0.190, (0 split)
## fs_meat < 1.5 to the right, agree=0.689, adj=0.095, (0 split)
## med_sickdays_hhave < 2.018315 to the left, agree=0.689, adj=0.095, (0 split)
##
## Node number 43: 8 observations
## predicted class=1 expected loss=0.125 P(node) =0.04545455
## class counts: 1 7
## probabilities: 0.125 0.875
##
## Node number 84: 40 observations, complexity param=0.02272727
## predicted class=0 expected loss=0.425 P(node) =0.2272727
## class counts: 23 17
## probabilities: 0.575 0.425
## left son=168 (20 obs) right son=169 (20 obs)
## Primary splits:
## cons_med_children < 1.249358 to the right, improve=1.2500000, (0 missing)
## age < 35.5 to the left, improve=0.8632832, (0 missing)
## children < 3.5 to the left, improve=0.8632832, (0 missing)
## fs_chwholed_often < 0.3642956 to the left, improve=0.2317043, (0 missing)
## household_size < 6.5 to the left, improve=0.1928571, (0 missing)
## Surrogate splits:
## children < 3.5 to the left, agree=0.825, adj=0.65, (0 split)
## med_sickdays_hhave < 1.525668 to the right, agree=0.800, adj=0.60, (0 split)
## fs_chwholed_often < 0.3642956 to the left, agree=0.775, adj=0.55, (0 split)
## household_size < 5.5 to the left, agree=0.725, adj=0.45, (0 split)
## years_of_edu < 8.5 to the left, agree=0.625, adj=0.25, (0 split)
##
## Node number 85: 21 observations
## predicted class=1 expected loss=0.3809524 P(node) =0.1193182
## class counts: 8 13
## probabilities: 0.381 0.619
##
## Node number 168: 20 observations
## predicted class=0 expected loss=0.3 P(node) =0.1136364
## class counts: 14 6
## probabilities: 0.700 0.300
##
## Node number 169: 20 observations, complexity param=0.01136364
## predicted class=1 expected loss=0.45 P(node) =0.1136364
## class counts: 9 11
## probabilities: 0.450 0.550
## left son=338 (7 obs) right son=339 (13 obs)
## Primary splits:
## age < 42.5 to the right, improve=0.3175824, (0 missing)
## children < 5 to the left, improve=0.1500000, (0 missing)
## household_size < 6.5 to the left, improve=0.1500000, (0 missing)
## years_of_edu < 9.5 to the left, improve=0.1000000, (0 missing)
## cons_med_children < 1.136585 to the left, improve=0.1000000, (0 missing)
## Surrogate splits:
## children < 3 to the left, agree=0.80, adj=0.429, (0 split)
## cons_med_children < 0.1501441 to the left, agree=0.80, adj=0.429, (0 split)
## fs_chwholed_often < 0.1666667 to the left, agree=0.80, adj=0.429, (0 split)
## cons_nondurable < 49.48167 to the right, agree=0.75, adj=0.286, (0 split)
## asset_durable < 69.1864 to the right, agree=0.75, adj=0.286, (0 split)
##
## Node number 338: 7 observations
## predicted class=0 expected loss=0.4285714 P(node) =0.03977273
## class counts: 4 3
## probabilities: 0.571 0.429
##
## Node number 339: 13 observations
## predicted class=1 expected loss=0.3846154 P(node) =0.07386364
## class counts: 5 8
## probabilities: 0.385 0.615
##
## n= 176
##
## node), split, n, loss, yval, (yprob)
## * denotes terminal node
##
## 1) root 176 88 0 (0.50000000 0.50000000)
## 2) years_of_edu>=6.5 141 60 0 (0.57446809 0.42553191)
## 4) med_sickdays_hhave< 0.08333333 28 4 0 (0.85714286 0.14285714) *
## 5) med_sickdays_hhave>=0.08333333 113 56 0 (0.50442478 0.49557522)
## 10) ed_expenses< 45.48365 102 46 0 (0.54901961 0.45098039)
## 20) cons_social>=0.7607301 33 9 0 (0.72727273 0.27272727) *
## 21) cons_social< 0.7607301 69 32 1 (0.46376812 0.53623188)
## 42) hh_children< 2.5 61 30 0 (0.50819672 0.49180328)
## 84) age>=26.5 40 17 0 (0.57500000 0.42500000)
## 168) cons_med_children>=1.249358 20 6 0 (0.70000000 0.30000000) *
## 169) cons_med_children< 1.249358 20 9 1 (0.45000000 0.55000000)
## 338) age>=42.5 7 3 0 (0.57142857 0.42857143) *
## 339) age< 42.5 13 5 1 (0.38461538 0.61538462) *
## 85) age< 26.5 21 8 1 (0.38095238 0.61904762) *
## 43) hh_children>=2.5 8 1 1 (0.12500000 0.87500000) *
## 11) ed_expenses>=45.48365 11 1 1 (0.09090909 0.90909091) *
## 3) years_of_edu< 6.5 35 7 1 (0.20000000 0.80000000) *
## Confusion Matrix and Statistics
##
## Reference
## Prediction 0 1
## 0 19 15
## 1 18 22
##
## Accuracy : 0.5541
## 95% CI : (0.4339, 0.6698)
## No Information Rate : 0.5
## P-Value [Acc > NIR] : 0.2080
##
## Kappa : 0.1081
##
## Mcnemar's Test P-Value : 0.7277
##
## Sensitivity : 0.5135
## Specificity : 0.5946
## Pos Pred Value : 0.5588
## Neg Pred Value : 0.5500
## Prevalence : 0.5000
## Detection Rate : 0.2568
## Detection Prevalence : 0.4595
## Balanced Accuracy : 0.5541
##
## 'Positive' Class : 0
##
5.3.2 Summary of Decision Tree model on balanced dataset (down
sampling)
Accuracy: 55.41%
Kappa: 0.1081, the model’s ability to predict is not good.
Node: Split based on years_of_edu, which suggests that this variable
is an important factor in distinguishing between the two categories
(depressed or not).
FIRST LEVEL SEGMENTATION: Further segmentation is done based on
med_sickdays_have which shows that health status is also an important
factor that affects depression status.
Deeper nodes: Various variables such as ed_expenses, cons_social
etc. are used in deeper nodes which shows that the model tries to
categorize through several different features to increase the accuracy
of decision making.